Mining genomic regions associated with agronomic and biochemical traits in quinoa through GWAS

Quinoa (Chenopodium quinoa Willd.), an Andean crop, is a facultative halophyte food crop recognized globally for its high nutritional value and plasticity to adapt to harsh conditions. We conducted a genome-wide association study on a diverse set of quinoa germplasm accessions. These accessions were evaluated for the following agronomic and biochemical traits: days to 50% flowering (DTF), plant height (PH), panicle length (PL), stem diameter (SD), seed yield (SY), grain diameter (GD), and thousand-grain weight (TGW). These accessions underwent genotyping-by-sequencing using the DNBSeq-G400R platform. Among all evaluated traits, TGW represented maximum broad-sense heritability. Our study revealed average SNP density of ≈ 3.11 SNPs/10 kb for the whole genome, with the lowest and highest on chromosomes Cq1B and Cq9A, respectively. Principal component analysis clustered the quinoa population in three main clusters, one clearly representing lowland Chilean accessions, whereas the other two groups corresponded to germplasm from the highlands of Peru and Bolivia. In our germplasm set, we estimated linkage disequilibrium decay to be ≈ 118.5 kb. Marker-trait analyses revealed major and consistent effect associations for DTF on chromosomes 3A, 4B, 5B, 6A, 7A, 7B and 8B, with phenotypic variance explained (PVE) as high as 19.15%. Nine associations across eight chromosomes were also found for saponin content with 20% PVE by qSPN5A.1. More QTLs were identified for PL and TGW on multiple chromosomal locations. We identified putative candidate genes in the genomic regions associated with DTF and saponin content. The consistent and major-effect genomic associations can be used in fast-tracking quinoa breeding for wider adaptation across marginal environments.


Phenotypic evaluation
The association mapping panel, mostly comprising genotypes from natural populations (Supplementary Table S1), displayed significant phenotypic variation for the traits under investigation.Table 1 presents the range, mean, standard deviation, and heritability values for different traits across two seasons.Days to flowering (DTF), plant height (PH), stem diameter (SD), panicle length (PL), seed yield/plant (SY), grain diameter (GD), and thousand-grain weight (TGW) ranged from 36 to 76 days, 24-184 cm, 2.6-15.6 mm, 7.8-51.6cm, 1.5-73.5 g, 1.3-2.4mm, and 0.8-4.6 g, respectively.The greatest variation was observed for plant height and seed yield, whereas grain diameter, thousand-grain weight, and days to flowering showed the lowest variation.The broadsense heritability for all traits ranged from 46 to 99%.The heritability of SD was found to be the minimum, whereas the heritability of TGW was up to 99%.All traits showed normal or near-normal distribution among the population lines as revealed by histogram of residuals, and normal Q-Q plot (Fig. 1).

Trait
Season Range Mean ± SD Heritability F-value

Sequencing and SNP analysis
Sequencing of 201 accessions revealed a mapping depth in the range of 6.44-15.06X, with an average of 10.95 X, whereas the genome coverage ranged from 55. 42 to 98.40%, with an average coverage of 97.11%.MAFfiltered SNPs were further filtered manually considering heterozygous loci as missing, which resulted in 383,251 high-quality SNPs.Assuming a total length of all 18 chromosomes of 1.2 Gb as reported by Jarvis et al. 10 , the average SNP density was ≈ 3.11 SNPs/10 kb, for which the highest and lowest marker densities were observed on chromosomes Cq1B (6.43 SNPs/10 kb) and Cq9A (2.10 SNPs/10 kb), respectively (Fig. 2a; Supplementary Table S2).We observed a high SNP density on the B genome (≈ 3.78 SNPs/10 kb) vis-à-vis the A genome (≈ 2.44 SNPs/10 kb).The chromosome-wise distribution of all SNPs is given in Supplementary Table S2.

Genome-wide association analyses of agronomic traits
Genomic associations were declared significant following FDR ≤ 0.1 and − Log10P ≥ 4 as a threshold with more than two SNP associations in a genomic region equivalent to the LD decay (118.5 kb).Marker-trait associations were identified for three out of seven agronomic traits studied and one biochemical trait (saponin content).Significant associations were found for DTF, PL, TGW, and saponin content, but not for PH, SD, and GD.We identified a total of 19 genomic regions associated with four traits: DTF, PL, TGW, and saponin content (Table 2, Supplementary Table S3).Genomic associations for DTF exhibited a significant effect across the two seasons as well as in the combined analysis, whereas other agronomic traits showed significant associations in either of the season and in the combined analysis also.
Genome wide association analysis for days to flowering resulted in at total of 117 marker trait association of which 74 SNPs has shown consistent associations with either both or one of the seasons as well as in the combined analysis (Table 2, Supplementary Table S3).Theses SNPs were located across 7 genomic regions on chromosome 3A, 4B, 5B, 6A, 7A, 7B, and 8B.The genomic region qDTF3A.1 and qDTF4B.1 comprising 5 and 4 SNPs with a phenotypic variance of upto11.8 and 15.4% respectively.The DTF QTL qDTF5B.1 comprised 25 SNPs in a 152.64-kb region explaining phenotypic variance of up to 17.01%.A total of four SNPs were found associated with DTF on chromosome 6A in a 27-kb genomic region and they explained phenotypic variance of up to 19.2%.Similarly, genomic regions on chromosomes 7A (qDTF7A.1)and 7B (qDTF7B.1)revealed association with DTF and explained phenotypic variance of up to 14% and 18.5%, respectively.The numbers of SNPs in qDTF7A.1 and qDTF7B.1 regions were 6 and 26, respectively.
A major-effect QTL was also identified for TGW on chromosome 1B (qTGW1B.1)during trial 1 (2019-2020) and combined analysis using BLUP values from trial 1 and trial 2. The genomic region qTGW1B.1 spans 130.79 kb harboring 12 SNPs explaining phenotypic variance of up to 16.5% and 17.8% in trial 1 and combined analysis respectively.Furthermore, significant genomic associations were observed for PL on chromosomes Cq3A, and Cq7B, but in only trial 2 and combined analysis explaining a phenotypic variance of up to 22.18%, as presented in Table 2.

Candidate genes for traits
To identify candidate genes underlying traits, we searched 118.5 kb upstream and downstream of the genomic regions associated with the traits assuming an LD decay of ~ 118.5 kb in our germplasm panel.Through GWAS analysis, we identified seven genomic regions associated with DTF.The genomic area qDTF4B.1 included 29 genes, including genes for Zinc-finger homeodomain protein 4 and PHD finger protein.The genomic region qDTF5B.1 (9.73-9.88Mb) consisted of 39 genes, including, among others, MYB3R5, FUT1, UGE1, HSFB2B, and CYTOCHROME P450 87A3.Downstream of qDTF5B.1, the transcriptional regulator gene SLK2 was located within the LD region.
The genomic region qDTF6A.1 contains a total of 21 genes (Supplementary Table S4); however, a gene encoding L10-interacting MYB domain-containing protein (LIMYB) within the identified genomic region was of importance for flowering.The genomic region on chromosome 7A (1.92-2.07Mb) included WUSCHEL-RELATED HOMEOBOX 3 (WOX3) gene and a gene encoding ankyrin repeat-containing protein.Apart from these genes found within the genomic region qDTF7A.1, the region downstream of the LD decay region contains a gene encoding the protein UPSTREAM OF FLC (UFC).Downstream of the genomic region qDTF8B.1 consisted of a gene encoding protein POLLENLESS 3 along with other 14 genes.
The genomic regions associated with thousand-grain weight (TGW) viz., qTGW1B.1 included gene encoding Agamous-like MADS-box protein AGL12 which was found to be of importance for grain weight.Among the two genomic regions associated with panicle length, the genomic region qPL3A1.1 included gene encoding FRL4B: FRIGIDA-like protein 4b and FAR1-RELATED SEQUENCE 7 protein whereas qPL7B.1 consists of gene encoding F-box/FBD/LRR-repeat protein.
Among nine genomic regions associated with saponin content, the genomic region qSPN5B.1 on chromosome Cq5B (8.90-9.54Mb)contained two copies of bHLH25 and several copies of UDP-GLYCOSYLTRANSFERASE (UGT ) genes.Other genomic regions associated with saponin content in quinoa seeds primarily contained genes involved in fatty acid and carbohydrate metabolism (Supplementary Table S4d).
Genes identified within the LD region of the associated genomic regions are given in Supplementary Table S4 and the major genes involved in controlling traits of interest appear in Table 3.

Discussion
The primary goal of our study was to identify reliable marker-trait associations for key agronomic and biochemical traits and to accelerate the development of quinoa varieties tailored for cultivation in areas with diverse environments.An early identification and selection of desirable traits, including those that appear late during plant development or at maturity, are crucial for advancing the breeding cycle.Genome-wide association studies using a diverse mapping panel offer a promising approach toward this goal because they do not require the development of permanent mapping populations (biparental or other complex populations) that consume much time.ICBA's gene bank has one of the largest quinoa collections outside South America.A diverse collection based on phenotypic and biochemical data was shortlisted to constitute an association mapping panel for further rigorous phenotyping over the years, and for re-sequencing the genomes of interesting quinoa accessions.

Phenotyping
The heritability estimates, range, and Q-Q plots of the phenotypic traits in our study clearly indicate the fitness of data for marker-trait analysis (Fig. 1).The economically important trait TGW revealed the highest heritability in both seasons, suggesting its suitability in breeding programs as an important selection criterion.Heritability of DTF was found to be relatively high (81-90%), while PH showed a slightly lower heritability (53-71%) (Table 1).Most such traits are robust and relatively little influenced by genotype × environment (G × E) interaction.Maldonado-Taipe et al. 13 also reported a high heritability of DTF (82.99%) and TGW (91.06%) in a biparental F 2:3 mapping population derived from Chilean (PI-614889) × Peruvian (CHEN-109) accessions.Another recent study also reported high heritability of DTF (91.0%) and TGW (89.7%) in a GWAS panel comprising 303 quinoa accessions 12 .However, the heritability of the PH trait reported by Maldonado-Taipe et al. 13   in their respective quinoa mapping populations showed a contrasting pattern.Heritability of PH in a biparental mapping population was 38.4% 13 , whereas in a GWAS panel it was 85.0% 12 .Al-Naggar et al. 14 also reported 60.7% heritability for PH in their studies.This moderate to high heritability of PH indicates its relatively low influence by the environment, although a more robust study involving multi-year and multi-location experiments could provide more robust information.Followed by TGW, GD also showed very high heritability, whereas the other three traits (SD, PL, and SY) revealed moderate to high heritability in both growing seasons.SD and PL also showed high correlation with PH over the seasons (70-89%), signifying the importance of all three traits for quinoa improvement programs through both direct and indirect selection (Supplementary Table S5).Nevertheless, none of the associations revealed confounding effects of PH on SD/PL or vice versa, which was surprising, suggesting that more detailed investigation is required to better understand these traits.Based on previous reports and our study, DTF and TGW are highly robust and reliable for marker-trait analysis in quinoa, followed by PH and PL.Interestingly, in our genomic association study, consistent and robust genomic regions observed over the seasons were evident only for DTF among all agronomic traits investigated.

SNP distribution across the quinoa genome
The overall average SNP density in our study was found to be 3.11 SNPs/10 kb and the B genome had a higher SNP density than the A genome.Patiranage et al. 12 reported an average SNP density of 2.39 SNPs/kb while performing whole-genome sequencing of 312 accessions on the Illumina platform, ranging from 0 to 122 SNPs/kb, and no significant difference was observed in SNP densities of the A and B genomes.The size of the B genome has been reported to be larger than that of the A genome in quinoa 10 .A previous report 12 and our study clearly indicate that the genome sizes and SNP densities are not correlated.Furthermore, both studies supported the view of possible recombination and chromosomal rearrangements between the A and B sub-genomes 10 .

Genetic diversity
In order to understand the genetic diversity of the germplasm used for our study, principal component analysis was conducted that broadly divided the whole population into three diverse groups: groups I, II, and III (Fig. 3a).Group I comprised mainly accessions from lowlands, whereas groups II and III had accessions from highlands.The majority of the lowland-specific group I accessions came from coastal areas of Chile, where temperature, light, and humidity conditions are significantly different from those of the Peruvian/Bolivian highlands.Genotypes categorized in groups II and III were mainly from the highlands of Peru and Bolivia.Recent reports with germplasm panels also divided the populations into those from highlands and lowlands, like our finding 11,12 .The first and second principal components in our study explained 37.65% and 9.42% of the total variation, respectively.In previous PCA studies 11,12 , PC1 and PC2 explained 23.35/31.6%and 9.45/6.80%variation, respectively, thereby showing a similar trend as our report.Also, similar to the analyses performed by Mizuno et al. 11 and Patiranage 12 , accessions categorized as "collected from USA" were spread in all three sub-groups in the PC analysis in our study (Fig. 3a).This raises considerable concern about the reliability of passport data obtained from the WSU repository about the correct origin of the quinoa accessions, as addressed before 11,12 .Unlike Mizuno et al. 11 , we could not differentiate the two highland groups, even though they showed a clear distinction in Fig. 3a.This might be due to either the higher movement of germplasm materials between southern and northern highland areas or inappropriate collection site information.Similar patterns and problems were reported in a recent study on deciphering the diversity of Iranian wheat landraces 15 .A well-organized systematic germplasm collection in centers of diversity established using advanced GIS tools should help in overcoming such problems.In the case of quinoa, this is very important as researchers have defined five distinct ecotypes of Chenopodium quinoa based on origin 16 .However, even using high-density marker platforms, we could not explain the genetic basis of those five groups.This information is important and should be investigated in detail in the future as it could be quite useful for the improvement of quinoa germplasm and its agricultural deployment.
A mean genome-wise LD in the GWAS panel was observed at 118.5 kb estimated as decayed half to its maximum, which is one of the most commonly used thresholds to define unlinked loci 17 .Interestingly, we observed an inverse relationship between population size and LD decay in our study, which corroborates results from previous reports 11,12 .The numbers of lines in the populations of Mizuno et al. 11 , Patiranage et al. 12 , and our study were 136, 312, and 201, respectively.LD decay was found to be minimal in the population with 312 lines 12 , Table 3. Major candidate genes located in the identified genomic regions associated with traits of interest.*The position of the gene IDs refers to their position on quinoa reference genome V2 (CoGe id 60716).Where DTF = days to flowering; TGW = thousand-grain weight; PL = panicle length; SPN represents saponin content in quinoa seeds.

Marker-trait associations
Marker-trait associations (MTAs) largely depend on the extent of LD decay in self-pollinated crop species.There are two contrasting reports, one suggesting the non-suitability of GWAS in quinoa for determining genomic associations, while another one strongly favors it 12 .Mizuno et al. 11 performed their study with a set of 136 genotypes phenotyped for six traits using a GBS-based assay with 5753 SNPs, whereas Patiranage et al. 12 conducted an analysis involving 312 germplasm accessions phenotyped for 17 traits and GWAS was done with 2.9 million SNP markers identifying 1480 significant MTAs.Our MTA study was conducted on a germplasm set of 187 lines phenotyped for seven traits using 383,251 SNP markers.We successfully identified a total of 387 genomic associations with four out of seven traits investigated (Supplementary Table S3).These three reports (including our current study) highlight the importance of marker density, population size, and phenotypic as well as existing genotypic variability of the target traits.
It was interesting to note that, out of 17 traits subjected to analysis by Patinarage et al. 12 , only DTF and TGW showed a consistent effect across two seasons.Our analysis also showed the same pattern only for DTF with consistency over years.In our study, TGW also revealed moderate to high correlation with GD (Supplementary Table S5), although no significant marker-trait associations were found for GD, possibly because of the degree of precision for trait phenotyping.Phenotypic variability of GD may largely depend on the type of scale and precision used for the measurements.The higher the precision, the greater the chance for identifying QTLs.We recommend higher precision for genome-wide association studies, especially for traits with smaller dimensions such as grain diameter.
Defining MTAs in GWAS is relatively challenging as compared to biparental mapping analysis because of threshold values.Following a highly stringent and conservative criterion, Bonferroni correction, one loses the majority of the MTAs 18 , which otherwise might be declared significant considering the FDR method 19 .A large number of studies are available in which investigators define a threshold of 0.0001 (or less) to declare significant MTAs 15 .Here, we followed a slightly modified approach in which we considered FDR ≤ 0.1 and − Log 10 P ≥ 4 as a threshold only when more than one SNP was found associated in a genomic region of 118.5 kb (as per LD decay results).Interestingly, qDTF5B.1 and qDTF7B.1 revealed 19 and 33 SNPs, respectively, associated significantly with DTF within the region (Table 2; Supplementary Table S3).This method can help greatly in developing diagnostic markers and deploying GWAS results in quinoa germplasm improvement processes.
Genomic associations for saponin content were identified on chromosomes 1B, 4A, 5A, and 5B (Table 2) based on previously reported saponin content and genotypic data for our GWAS population 20 .Jarvis et al. 10 and Patinarage et al. 12 reported a consistent and major-effect genomic association on chromosome 5B that corroborates our findings.Patinarage et al. 12 also reported inconsistent QTLs on chromosomes 1B, 4A, and 5A, as in our study.This suggests that these chromosomes might also harbor minor QTLs underpinning the trait.A more detailed analysis involving contrasting large biparental mapping populations would be beneficial for detecting minor QTLs with a higher confidence on these chromosomes for understanding the genetic basis of saponin content in quinoa seeds.

Candidate genes
Putative candidate genes were searched within LD region of associated genomic regions and homologs reported for controlling flowering time or floral development within the distinct genomic regions that were identified in our study.The genomic region qDTF4B.1 consisted of genes encoding Zinc-finger homeodomain protein 4 (ZHD4) and PHD finger protein.The ZHD4 also called as FLORAL TRANSITION AT THE MERISTEM2 (FTM2) has been reported playing an important role in regulating floral induction in plants 21 .The PHD finger protein plays a key function in the chromatin-mediated repression of flowering, ensuring the precise control of flowering time 22,23 .It has been reported to be involved in controlling both vernalization and photoperiod pathways in Arabidopsis 24 .Another important gene near qDTF8B.1 was Protein POLLENLESS 3 which is essentials for male fertility, especially for microspore and pollen grain production in Arabidopsis 25 .Among the genes located in the genomic region qDTF5B.1, the FUT1 gene encoding galactoside 2-alpha-L-fucosyltransferase regulates flower development in tobacco (Nicotiana tabacum) 26 while HEAT SHOCK TRANSCRIPTION FACTOR B2b (HsfB2b) acts as a transcriptional repressor of VIN3, a gene induced by long-term cold for flowering in Arabidopsis thaliana 27 .Further downstream of qDTF5B.1, a gene encoding transcriptional regulator SLK2 was located within the LD region.SLK2 functions together with co-regulator SEUSS in LEUNIG-regulated processes during flower development, thereby controlling flowering time 28 .The genomic region qDTF7A (1.92-2.07Mb) includes WUSCHEL-RELATED HOMEOBOX 3 (WOX3) and a gene encoding an ankyrin repeat-containing protein.
WOX3 genes are involved in the formation of lateral sepals and sepal margins in flowers 29 .Although ankyrin repeat-containing protein in quinoa has, to our knowledge, not been reported to be involved in flowering control, Arabidopsis plants with a mutated ankyrin repeat-containing protein showed delayed flowering 30 .The genomic region downstream of qDTF7A contained a gene encoding the protein UPSTREAM OF FLC (UFC).The UFC gene is adjacent to FLOWERING LOCUS C (FLC), which responds to vernalization and acts as a floral repressor in many temperate species 31,32 .
The genomic region qTGW1B.1 mapped for thousand-grain weight consists of 14 genes in the LD region.However, one gene encoding Agamous-like MADS-box protein AGL12 was of importance as it has been reported to be involved in numerous developmental process including seed and embryo development 33 .The genomic regions associated with panicle length consists of genes encoding FRL4B: FRIGIDA-like protein 4b and FAR1-RELATED SEQUENCE (FRS) 7 protein in genomic region qPL3A.1 and F-box/FBD/LRR-repeat protein in www.nature.com/scientificreports/qPL7B.1.The FRIGIDA-like protein and FAR1 proteins has been involved in controlling flowering time, leading to late flowering phenotype in plants and inducing the development of enlarged panicles 34,35 .Further, FRS members has been reported as the candidate genes for Panicle and Spikelet Degeneration (PSD) gene in rice 36 and has been mapped in qPL5 of rice 37 .F-box protein encoding genes has been reported to be involved in floral transition as well as panicle and seed development 38 .
The genomic region qSPN5B.1 on chromosome Cq5B (8.90-9.54Mb) associated with saponin content contains two copies of bHLH25 and several UGT (UDP-GLYCOSYLTRANSFERASE) genes.The bHLH25 transcription factor has been identified in several studies and reported as TRITERPENE SAPONIN BIOSYNTHESIS ACTIVATING REGULATOR 2 (TSAR2) controlling the production of saponin in quinoa seeds 10,12,13 .Apart from bHLH25, the genomic region qSPN5B.1 also contains several UDP-GLYCOSYLTRANSFERASE genes involved in the glycosylation of triterpenoid saponins [39][40][41][42] .The other genomic regions associated with saponin content in quinoa seeds primarily contain genes involved in fatty acid and carbohydrate metabolism.
The SNP markers and candidate genes identified in this study can be efficiently used to develop the molecular markers-based robust and reliable selection system that can further help to fasten the pace of conventional breeding efforts.This is crucial for the traits which appear only during late growth stages (flowering duration and plant height) or only after harvesting (saponin content).Early identification of desired segregants through molecular markers will allow us to keep only positive plants without maintaining full population that include unwanted genotypes and save the limited resources.Similarly crossing of the desired plants would be possible without waiting for the trait to appear during late growth stage.The developed improved quinoa genotypes will be easily accepted by farmers once the major bottlenecks in quinoa genotypes (saponin content and long duration) will be overcome.
Quinoa has earned special attention worldwide due to its exceptional nutritional quality and health benefits and its ability to adapt to marginal environments especially saline soils and drought stressed marginal agroecosystems.Most of the target regions need improvement of traits like shortening of growth period, sweet grain types, reduced water requirement and non-lodging type sturdy stem with medium plant height.Current varieties in the public domain do not have these desired traits hence farmers are discouraged from cropping quinoa either due to non-availability of preferred varieties or extra cost incurred on post-harvest processing.Based on these assumptions, we targeted to breed the desired varieties for the region which are high yielding, sweet genotype with early maturity.This will allow poor farmers to have access 'free to use' sweet quinoa varieties as all existing sweet quinoa varieties are from private sector and mostly poorly adapted to relatively hot regions.

Plant material and growth conditions
The gene bank of the International Center for Biosaline Agriculture (ICBA) (https:// www.biosa line.org/ abouticba/ facil ities/ geneb ank) contains more than 1000 Chenopodium spp.accessions.We previously evaluated a random set of 500 quinoa accessions for phenology at the ICBA farm during 2016-2017 and 470 out of 500 were analyzed for their triterpenoid saponin content using LC-MS 20 .To ensure a representative sample for our study, we selected 201 genotypes from the initial set of 500.We selected them based on inclusion of the full diversity spectrum available for all traits and diverse origins in a way that the panel represented all collection sites.Supplementary Table S1 provides information about the collection site of each accession in our study.
Of the 201 genotypes, 187 that had germinated and given consistent phenotypic data across trials were used for association analysis, whereas all 201 accessions were used for diversity studies and estimating LD decay (Supplementary Table S1).

Field experiments
Field experiments were conducted at the research farm facilities of the ICBA (latitude 25.0949; longitude 55.3899), located in Dubai, United Arab Emirates.The GWAS panel was phenotyped for various morphological traits across two seasons, 2019-2020 and 2020-2021.Because of the large number of accessions, we used an augmented randomized complete block design with five blocks, involving 190 test entries and six checks to compensate for the block effect.Each accession was assigned a plot size of 1.5 × 2.5 m 2 .Weather data were collected by a meteorological station during both growing seasons and are summarized in Supplementary Table S6.
Prior to sowing, we added and mixed poultry manure (Al Yahar organic manure, UAE) at an amount of 30 t/ha.Drip irrigation was employed maintaining a plant-to-plant spacing of 25 × 25 cm.For sowing, we used the dibbling method with a depth of 1-2 cm, with three to five seeds sown next to each dripper.Three weeks after sowing, the plants were thinned to maintain a single plant per dripper.Regular hand weeding was carried out to remove weeds and off-type plants.

Phenotypic and biochemical evaluation
Seven morphological traits-days to flowering (DTF), plant height (PH), main panicle length (PL), stem diameter (SD), seed yield per plant (SY), grain diameter (GD), and thousand-grain weight (TGW) were recorded to assess the variation among the quinoa genotypes; some of the data were previously reported 20 .DTF was recorded when approximately 50% of the plants had flowered.PH, PL, and SD were measured at the time of harvesting.SY was recorded after threshing the harvested plants and GD was determined using ten seeds, with their length averaged by dividing the total length by 10.TGW was determined by counting and weighing 1000 seeds using a seed counter.Basic statistical parameters and analysis of variance (ANOVA) were used with PBTools v1.4 (http:// bbi.irri.org/ produ cts).The following linear model was used to determine the observed response of the i-th treatment in the j-th block: Y ijt is response variable in equation where µ denotes the overall mean, T i denotes the effect of the genotypes, β j denotes the effect of the jth block, block within trial, genotype within trial and e ijt denotes the error variance of the ith treatment in the jth block of tth trial.Best linear unbiased predictions (BLUP) of the traits were calculated for all traits through PBTools v1.4 (http:// bbi.irri.org/ produ cts) using one stage multi-environment analysis.
The data for the total seed saponin content of 176 quinoa genotypes were used from Tabatabaei et al. 20 as the sum of intensity of 37 triterpenoid saponins that were detected through liquid chromatography-mass spectrometry (LC-MS) analysis.

Genome sequencing
DNA was extracted from flash-frozen leaf samples using a modified CTAB method 43 .The purity and quality of the isolated DNA were verified by agarose gel electrophoresis on 0.8% agarose gels and the concentration was determined on a Qubit 4 fluorometer using Qubit Broad Range Assay Kit (Thermo Fisher Scientific Inc., Waltham, MA, USA).One microgram of genomic DNA was mechanically fragmented to an average size of 250 bp using the Covaris® M220 Focused-ultrasonicator™ (Covaris, Woburn, MA, USA) and the size selection of fragmented DNA was done using MGIEasy DNAClean beads (MGI Tech, Shenzhen, China).A single-stranded circular DNA library was prepared using MGIEasy Universal DNA Library Prep Set Ver.1.0 following the manufacturer's standard protocol for a 250-bp insert size, followed by DNA nano ball (DNB) formation based on rolling circle amplification.The DNB was loaded into the flow cell (DNBSEQ-G400RS Sequencing Flow Cell Ver.3.0) and cPAS-based 100-bp paired-end sequencing was performed with DNBSEQ-G400RS High-Throughput Sequencing Set Ver.3.1 (MGI Tech).

Population analysis
Principal component analysis was performed using the filtered SNP set with GCTA software (version 1.93.2 beta).A neighbor-joining tree was constructed on the same SNP set using PHYLIP (version 3.69) 49 and the tree layout was generated using online tool iTOL (https:// itol.embl.de).

Linkage disequilibrium and GWAS analysis
The LD was determined by calculating the squared correlation coefficient (r 2 ) for all manually filtered SNPs.The pairwise LD values were determined in TASSEL 5 (ver.20230519) and the values were plotted against physical distance (bp) in R Studio following Remington et al. 50.The pattern of LD decay was determined as the distance at which LD values declined to half of their maximum (0.4691) value.The SNP density plot was created using R package CM-plot (https:// github.com/ YinLi Lin/ CMplot) to look at the number of SNPs within 0.5-Mb windows in all 18 chromosomes.Genome-wide association analysis was conducted with high-quality filtered SNPs using a mixed linear model.Marker-trait association was examined in TASSEL 5 (ver.20230519) (https:// www.maize genet ics.net/ tassel).The first five principal components were incorporated as covariates in the GWAS model as a fixed effect whereas kinship among the genotypes was incorporated as a random effect to consider the population stratification and control false positives in the marker-trait analysis.Kinship (K) was determined using the Centered IBS (identity by state) method 51 .Analysis to establish marker-trait association was carried out for seven agronomic traits (from two consecutive trials) and saponin content in seeds of individual genotypes 20 using a mixed linear model through TASSEL 5 (ver.20230519).The combined GWAS analysis was further carried out using BLUP values calculated from the trait values collected from both the years (Supplementary table S7a).Manhattan plots and Q-Q plots of each trait were drawn using the modified R package "qqman" 52 .
We calculated false discovery rate (FDR) with the Benjamini and Hochberg method on a per-chromosome basis and selected SNPs with FDR < 0.1 as significant.Only those genomic regions in which multiple SNPs (three or more) showed associations with − log10 (p-value) > 4 and FDR value < 0.1 within the LD region for either of the traits were declared to be significantly associated.The group of significantly associated SNPs based on their p-values across trials was identified as putative genomic regions associated with traits and was considered as a Y ijt = µ + T i + β j + (block : trial) + genotype : trial + e ijt

Figure 2 .
Figure 2. (a) SNP density across 18 chromosomes of quinoa representing the number of SNPs within a 0.5-Mbp window size; (b) genome-wide linkage disequilibrium decay of r 2 values against physical distance (bp); the LD decay has been considered as the distance at which r 2 dropped to half of its maximum value (0.4691).

Figure 3 .
Figure 3. (a) PCA analysis; (b) neighbor joining tree of association mapping panel.Group I contains the genotypes from lowlands majorly, whereas group II and III were dominated by accessions that originated in highlands.
and Patiranage et al.12

Figure 4 .
Figure 4. Manhattan and Q-Q plots of genome-wide association mapping of measured traits.The y-axis in each graph represents − log10P for the p-value of the MTAs, while chromosome numbers are indicated on the x-axis.The red dashed line presents the threshold − log10P value (= 4) which was used for declaring the significant associations along with another criterion i.e. number of SNPs within LD decay region.

Figure 5 .
Figure 5. Various genomic regions identified through GWAS associated with various traits of interest.Green; purple and red text of the genomic regions indicates the genomic region identified in both individual as well as combined analysis; in either of the individual season and combined analysis and through metabolomic data respectively. https://doi.org/10.1038/s41598-024-59565-8

Table 2 .
Marker trait associations for various agronomic and biochemical traits in quinoa.Where DTF = days to flowering; TGW = thousand-grain weight; PL = panicle length; SPN represents saponin content in quinoa seeds.

MTA name Chr Span (Mb) Interval (kb) No. of SNPs Phenotypic variance R 2 (%) Trial 1 Trial 2 Combined
11termediate with 201 lines (our study), and maximal with 136 lines11.This clearly underpins the importance of larger population size to minimize LD decay.